AppNo.: 09/657,472 

Title: SINGLE NUCLEOTIDE... 

Inventors: Eric S. Lander, et al. 

HT1220 Report 



RECORD INFORMATION 



Gene ID : 
Sequence ID: 
Protein ID : 
Sequence name: 
Genome : 
Taxon: 
Locus : 

Common Name : 
Role ID: 



1220 
1220 
1220 

thrombospondin 1, alt. transcript 1 

nucleus 

Homo sapiens 

1220 

thrombospondin 1 
40 



Coding sequence length: 3513 nt 

Transcript sequence length: 5722 nt 
Expression data: THC201S73 



ACCESSION DATA 

HT1220 is derived from accessions(s): 

SP : P07996 (THROMBOSPONDIN 1 PRECURSOR. ) 

GB:X04 66 5 (Human mRNA for thrombospondin) 

G3 :X14787 (Human mRNA for thrombospondin) 

GB : U124 71 ( thrombospondin -p5 0 {Homo sapiens}) 

GB : M9 94 2 5 (Human thrombospondin mRNA, 3' end.) 

PIR :G01478 ( thrombo spondin-p5 0 - human (fragment) )' 

GB : U12471 ( Human thrombospondin- 1 gene, partial cds . ) 

GB : J0433 5 (Human thrombospondin gene, exons 1, 2 and 3.) 

GB;M2 5631 (Homo sapiens (clone lambda-TS-33 ) thrombospondin (THBS) mRNA, 5' end 



ALTERNATIVE SPLICE INFORMATION 

Alternative splice forms for this gene: 

HT3987 thrombospondin 1, alt. transcript 2 



MAPPING DATA 

GDB accession(s) for this gene: 

GDB ID: Symbol 



Figure 1A 
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Inventors: Eric S. Lander, et al. 



gdb; 120438 THBS1 



cDNA FEATURES 

Feature End 5 End 3 

coding_seq 112 3624 

3-UT 3625 5722 

spjuncji 12 3 5 12 3 6 



SEQUENCE 
nucleotide: 

ggacgcacaggc.attccccgcgcccctccagccctcgccgccctcgccaccgctcccggc 

cgccgcgctccggcacacacaggatccctgctgggcaccaacagctccaccatggggctg 

gcctggggactaggcgtcctgttcctgatgcatgtgtgtggcaccaaccgcattccagag 

tctggcggagacaacagcgtgtttgacatctttgaactcaccggggccgcccgcaacggg 

tctgggcgccgactggtgaagggccccgacccttccagcccagctttccgcatcgaggac 

gccaacctgatcccccctgtgcctgatgacaagttccaagacctggtggatgctgtgcgg 

gcagaaaagggtttcctccttctggcatccctgaggcagatgaagaagacccggggcacg 

ctgctggccctggagcggaaagaccactctggccaggtcctcagcgtggtgtccaatggc 

aaggcgggcaccctggacctcagcctgaccgtccaaggaaagcagcacgtggtgtctgcg 

gaagaagctctcctggcaaccggccagtggaagagcatcaccctgtttgtgcaggaagac 

aggacccagctgtacatcgactgtgaaaagatggagaatgctgagttggacgtccccatc 

caaagcgtcttcaccagagacctggccagcatcgccagactccgcatcgcaaaggggggc 

gtcaatgacaatttccagggggtgctgcagaatgtgaggtttgtctttggaaccacacca 

gaagacatcctcaggaacaaaggctgctccagctctaccagtgtcctcctcacccttgac 

aacaacgtggtgaatggttccagccctgccatccgcactaactacattggccacaagaca 

aagcacttgcaacccatctgcggcatctcctgtgatgagctgtccagcatggtcctggaa 

ctcagaggcctgcgcaccattgtgaccacgctgcaggacagcatccgcaaagtgactgaa 

gacaacaaagagttggccaatgagctgaggcggcctcccctatgctatcacaacggagtt 

cagtacagaaataacgaggaatiggactgttgatagctgcactgagtgtcactgtcagaac 

tcagttaccatctgcaaaaaggtgtcctgccccatcatgccctgctccaatgccacagtt 

cctgatgg agaatgctgtcctcgctgttggcccagcgactctgcggacgatggctggtcc 

ccatggtccgagtggacctcctgttctacgagctgtggcaatggaattcagcagcgcggc 

cgctcctgcgacagcctcaacaaccgatgtgagggctcctcggtccagacacggacctgc 

cacattcaggagtgtgacaaaagatttaaacaggatggtggctggagccactggcccccg 

tggtcatcttgttctgtgacatgtggtgatggtgtgatcacaaggatccggctctgcaac 

tctcccagcccccagatgaatgggaaaccctgtgaaggcgaagcgcgggagaccaaagcc 

tgcaagaaagacgcctgccccaccaatggaggctggggtccttggtcaccatgggacatc 

tgttctgtcacctgtggaggaggggtacagaaacgtagtcgtctictgcaacaaccccgca 

ccccagtttggaggcaaggactgcgccggtgatgtaacagaaaaccagatctgcaacaag 

caacactgtccaattgatggatgcctgtccaatccctgctittgccggcgtgaagtgtact: 

agctaccctgatggcagctggaaacgtggtgcttgtcGCCCtggttacagtggaaatggc 

atccagtgcacagatgttgatgagrgcaaagaagtgcctgatgcctgctccaaccacaat 

ggacagcaccggtgrgagaacacggaccccggctacaactgcctgccctgccccccacgc 

ttcaccgactcacagcccttcggccagggtgtcgaacatgccacggccaacaaacaggtg 

tgcaagccccgtaacccctgcacggatgggacccacgaccgcaacaagaacgccaagtgc 

aactacctgggccactatagcgaccccatgtaccgctgcgagtgcaagcctggctracgct: 

aacaatggcatcatcngcggggaggacacagacctggatggctggcccaatgagaacctig 

gtatgcgcggccaatgcgactcaccactgcaaaaaggatiaactgccccaacctitcccaac 

tcaacjgcaggaagactatgacaaggatggaattggtgatgcctgtgacgatgacgatgac 

aatiataaaattccagatgacagggacaactgtccattccattacaacccagctcagtat: 

gactatgacaaaaatgatgtgggagaccgctgtgacaactgcccctacaaccacaaccca 
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gatcaggcagacacagacaacaatggggaaggagacgcctgtgctgcagacategatgga 

Lcggtatcctcaatgaacgggacaactgccagtacgtctacaatgtggaccagagagac 

LciLatggatggggttggagatcagtgtgacaattgccccttggaacacaatccggat 

iagctggactctgactcagaccgcattggagatacctgtgacaacaatcaggatattgat 

gaagatggccaccagaacaacctggacaactgtccctatgtgcccaatgccaaccaggct 

laccatlacaaagatggcaagggagatgcctgtgaccacgatgatgacaacgatggcatt 

Ltgatlacaaggacaactgcagactcgtgcccaatcccgaccagaaggactctgacggc 

gatggtcgaggtgatgcctgcaaagatgattttgaccatgacagtgtgccagacatcgat 

lacIIctgtStgagaatgttgacatcagtgagaccgacttccgccgattccagatgaCC 

IctctggaccccaaagggLatcccaaaatgaccctaactgggttgtacgccatcagggt 

^agStcg C =cagal? gt caactgtgatcctggactcgctgtaggttatgatgagct t 

aatgctgtggacttcagtggcaccttcttcatcaacaccgaaagggacgatgactatgct. 

gga?ttIt?LtggctLcIgtccagcagccgctt t tatgttgtgatgtggaag=aagt= 

Icccagtcctactaggacaccaaccccacgagggctcagggatactcgggcctttctgtg 

aaagt^gtaaactccaccacagggcctggcgagcacctgcggaacgccctgtggcacaca 

ggalacacccctggccaggtgcgcaccctgtggcatgaccctcgtcacataggctggaaa 

gltetcaccgccScagatggcgtc^^ 

gtgatgtatgaagggaagaaaatcatggctgactcaggacccatctacgataaaacctat 
IcSSggtlgalllgggttgttcgtcttctctcaagaaatggtgttcttctctgacctg 
laSIcglatgtagagatccStaatcatcaaattgttgattgaaagactgatcataaacc 

iaStStaftgcaccttctggaactatgggcttgagaaaacccccaggatcacttctc 
"tScftccttLtttctgtgcttgcatcagtgtggactcctagaacgtgcgacctgcc 
tcaSgaaaatgcagttttcaaaaacagactcatcagcattcagcctccaatgaataagac 
atcwcwagcatltaaacaattgctttggtttccttttgaaaaagcatctacttgcttc 
agttgagaaggtgcccat-ccactctgcctttgtcacagagcagggtgctattgtgaggc 
cltc?ctgagcagtggactcaaaagcattctcaggcatgtcagagaagggaggactcact 
^aaattalclaaLalaccaccctgacatcctccttcaggaacacggggagcagaggcca 
aagcactlaggggagggcgcatacccgagacgattgtatgaagaaaatatggaggaact^ 
ttacatgttcggtactaagtcactttcaggggatcgaaagactattgctggatttcatga 
tactgactggcgttagctaattaacccatgtaaataggcacttaaatagaagcaggaaag 
gSgKaaSactgglttctggactccctccctgatccccacccttactcatcaccttgc 
aatqqccaaaattagaaaatcagaatcaaaccagtgtaaggcagtgctggctgccattgc 
^gglcacattgaalttggtggcttcattctagatgtagcttgtgcagatgtagcaggaa 
aaSggaaaacctaccateccagtgagcaccagctgcctcccaaaggaggggcagccgt* 
cttatatttttatgattacaatggcacaaaattattaccaacctaactaaaacattcct. 
ttctctcttttccgtaattactaggtagttttctaattctctcttttggaagtatgautt 
ttctaaagtctttacgatgtaaaatatttattttttacttattctggaagatctggctga 
- agga^aScacggaLalgaagaagcgtaaagactatccatgtcatctttgttgagagc 
cttc-tqactgtaagattgtaaatacagattatttattaactctgttctgcctggaaat,. 
taggctlcatLgglaag^ttgagagcaagtagttgacatttatcagcaaatctcctg 
caaSaacagcacLggaaaatcagtctaataagctgctctgccccttgtgctcagagtgg 
atatcacgggattccttttttctctgtctcaccttttcaagcggaactagttggutatcc 
Slt^aSgttttaaatrgcaaagaaagccatgaggtcttcaatactgttttacccca 
Jcccltgtgcltatttccagggagaaggaaagcatatacacttttttctcccattt^cc 
LaagalaLaaaatgacaaaaggtgaaacttacatacaaatattacctcatttgtcgtg 
tgactgagtaaaaaatttttggaccaagcggaaagagtttaagtgtctaacaaacttaaa 
glJaccgLgtacctaaaaagtcagtgttgtacatagcataaaaactctgcagagaagta 
wcccaataloaaaacagcattgaaatgttaaatacaatttctgaaagttacgt.uc.cc 

tagaatattcalattgtgtagatatgccatttaaataatttatcaggaaatactgcctgc 

aalgtcagtatttctatttctatacaacgtttgcacactgaattgaagaatcgttgst.t 

ctatttgccaatacccccccctaggaatgtgctcttctttgtacacatttctatccatc. 
ta^at'itaaagcagtgtaagtcgnataccac.gtccctcatgtacaaggaacaacaata 

aatcatatggaaatttatattt 
protein: 

' MGLAWGLGVLFLMHVCGTNRIPESGGDNSVTDIFBLTGAARKGSGRRLVKGPDPSSP.^ 

Figure 1C 
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IEDANLIPPVPDDKFQDLVDAVRAEKGFLL 

' SNGKAGTIJDLSL TVQGKQHVVSVEEAIiLATGQWKS ITLFVQEDRAQLY IDCEKMHNAELD 
VPIQSVTTRDLASIARIxRIAKGG 

TLDNNWNGS S P AIRTNYIGHKTKDLQAI CGIS CDELS SMVLELRGLRTIVTTLQDSIRK 
VTEElJKEIoANELRRPPLCYHNGVQYRN^ 

ATVPDGECCPRCWPSDSADDGWSPWSEWTSCSTSCGNGIQQRGRSCDSLNKRCEGSSVQT 
RTCHIQECDKRFKQDGGWSHWSPWSSCSVTCGDGVITRIRI.CNSPSPQMNGKPCEGEARE 
TKACKKD ACP INGGWGP WS PWD I CS VTCGGGVQKRSRLCNNPAPQFGGKDCVGD VTENQ I 
CNKQDCPIDGCLSNPCFAGVKCTSYPDGSWKCGACPPGYSGNGIQCTDVDECKEVPDACF 
NHNGEHRCENTDPGYNCLPCPPRFTGSQPFGQGVEHATANKQ 

AKCNYLGHYSD PMYRCE CKP GYAGNG 1 1 CGEDTDLDGWPNENLVCVANATYHCKKDNCPN 
LPNSGQEDYBKDGIGDACDDDDDNDKXPDDRDNCPFHYOTAQYDYDRDDVGDRCDNCPYN 
HNPDQADTDNNGEGDACAADIDGDGILNERDNCQYVYl^ 
NPDQLDSDSDRIGDTCDNNQDIDEDGHQNK^ 

DGIPDDKDNCRLVPNPDQKI)SDGDGRGDACKDDFDHDSVPDIDDICPENVDXSETDFRRF 
QMIPIiDP KGT S QND PNVTVVRHQGKELVQTVNCD PGIxAVGYDEFNAVDF SGTFF INTERDD 
DYAGFVFGYQSSSRFYVVMWKQVTQSYWDTNPTRAQGYSGLSVKVTO 
WHTGNTPGQVRTLWHDPRHIGWKDFTAYRWRLSHRPKTGFIRVVMYEGKKIMADSGPITO 
KTYAGGRLGLFVFSQEMVFFSDLKYECRDP 
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HT2143 Report 



RECORD INFORMATION 



Gene ID : 
Sequence IB : 
Protein ID: 
Sequence name: 
Genome : 
Taxon: 
Locus : 

Common Name : 
Role ID: 



2081 
2143 
2125 

thrombospondin 4 
nucleus 
Homo sapiens 
2081 

thrombospondin 4 
40 



Coding sequence length.: 2 8 86 nt 

Transcript sequence length: 3 0 74 nt 

Expression data: THC163397 



ACCESSION DATA 

HT2143 is derived from accessions(s): 

SP:P35443 (THROMBOSPONDIN 4 PRECURSOR . ) 

G3 : 219585 ( thrombospondin- 4 {Homo sapiens}) 

G3 : 2195 8 5 (H. sapiens mRNA for thrombospondin-4 ) 

PIR: A55710 (thrombospondin 4 precursor - human) 



cDNA FEATURES 

Feature End 5 End 3 



coding_seq 2 8 2 913 

3 'UT ~ 2914 3074 



SEQUENCE 
nucleotide: 

gaatcccggggagcaggaagagccaacatgctggccccgcgcggagccgccgtcctcctg 
ctgcacctggtcctgcagcggtggctagcggcaggcgcccaggccaccccccaggtcttt: 
gaccttctcccatcttccagtcagaggctaaacccaggcgctctgctgccagtcctgaca 
gaccccgccctgaatgatctctatgtgatttccaccttcaagctgcagactaaaagttca 
gccaccatcttcggtctctactcttcaactga-caacagtaaatattttgaatttactgtg 
atgggacgcttaagcaaagccatcctccgttacctgaagaacgatgggaaggtgcatttg 



Figure 2A 
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gtggttttcaacaacctgcagctggcagacggaaggcggcacaggatcctcctgaggctg 
agcaatttgcagcgaggggccggctccctagagctctacctggactgcatccaggtggat 
tccgttcacaatctccccagggcctttgctggcccctcccagaaacctgagaccattgaa 
ttgaggactttccagaggaagccacaggacttcttggaagagctgaagctggtggtgaga 
ggctcactgttccaggtggccagcctgcaagactgcttcctgcagcagagtgagccactg 
gctgccacaggcacaggggactttaaccggcagttcttgggtcaaatgacacaattaaac 
caactcctgggagaggtgaaggaccttctgagacagcaggttaaggaaacatcatttttg 
cgaaacaccatagctgaatgccaggcttgcggtcctctcaagtttcagtctccgacccca 
agcacggtggtcgccccggctccccctgcaccgccaacacgcccacctcgtcggtgtgac 
tccaacccatgtttccgaggtgtccaatgtaccgacagtagagatggcttccagtgtggg 
ccctgccccgagggctacacaggaaacgggatcacctgtattgatgttgatgagrgcaaa 
taccatccctgctacccgggcgtgcactgcataaatttgtctcctggcttcagatgtgac 
gcctgcccagtgggcttcacagggcccatggtgcagggtgttgggatcagttttgccaag 
tcaaacaagcaggtctgcactgacattgatgagtgtcgaaatggagcgtgcgttcccaac 
tcgatctgcgttaatactttgggatcttaccgctgtgggccttgtaagccggggtatact 
ggtgatcagataaggggatgcaaagtggaaagaaactgcagaaacccagagctgaaccct 
tgcagtgtgaatgcccagtgcattgaagagaggcagggggatgtgacatgtgtgtgtgga 
gtcggttgggctggagatggctatatctgtggaaaggatgtggacatcgacagttacccc 
gacgaagaactgccatgctctgccaggaactgtaaaaaggacaactgcaaatatgtgcca 
aattctggccaagaagatgcagacagagatggcattggcgacgcttgtgacgaggatgct 
* gacggagatgggatcctgaatgagcaggataactgtgtcctgattcataatgtggaccaa 
aggaacagcgataaagatatctttggggatgcctgtgataactgcctgagtgtcttaaat 
aacgaccagaaagacaccgatggggatggaagaggagatgcctgtgatgatgacatggat 
ggagatggaataaaaaacattctggacaactgcccaaaatttcccaatcgtgaccaacgg 
gacaaggatggtgatggtgtgggggatgcctgtgacagttgtcctgatgtcagcaaccct 
aaccagtctgatgtggataatgatctggttggggactcctgtgacaccaatcaggacagt 
gatggagatgggcaccaggacagcacagacaactgccccaccgtcattaacagtgcccag 
ctggacaccgataaggatggaattggtgacgagtgtgatgatgatgatgacaatgatggt 
atcccagacctggtgccccctggaccagacaactgccggctggtccccaacccagcccag 
gaggatagcaacagcgacggagtgggagacatctgtgagtctgactttgaccaggaccag 
gtcatcgatcggatcgacgtctgcccagagaacgcagaggtcaccctgaccgacttcagg 
gcttaccagaccgtgggcctggatcctgaaggggatgcccagatcgatcccaactgggtg 
gtcctgaaccagggcatggagattgtacagaccatgaacagtgatcctggcctggcagtg 
gggtacacagcttttaatggagttgacttcgaagggaccttccatgtgaatacccagaca 
gatgatgactatgcaggctttatctttggctaccaagatagctccagcttctacgtggtc 
atgtggaagcagacggagcagacatattggcaagccaccccattccgagcagttgcagaa 
cctggcattcagctcaaggctgtgaagtctaagacaggtccaggggagcatctccggaac 
tccctgtggcacacgggggacaccagtgaccaggtcaggctgctgtggaaggactccagg 
aatgtgggctggaaggacaaggtgtcctaccgctggttcctacagcacaggccccaggtg 
ggctacatcagggtacgattttatgaaggctctgagttggtggctgactctggcgtcacc 
atagacaccacaatgcgtggaggccgacttggcgttt-tctgcttctctcaagaaaacatc 
atctggtccaacctcaagtatcgctgcaatgacaccatccctgaggactcccaagagttt 
caaacccagaatttcgaccgcttcgataattaaaccaaggaagcaatctgtaactgcctt 
tcggaacactaaaaccatatatattttaacttcaattttctttagcttttaccaacccaa 
atatatcaaaacgttttatgtgaatgtggcaataaaggagaagagatcatttttaaaaaa 
aaaaaaaaaaaaaa 

protein: 

MI^PRGAAVT^LLHLVLQRWLAAGAQATPQVFDLLPSS^^ 

I STFKXQTKS SAT I FGLYSS TDNS KYFEFTVMGRLS KAI LRYLKNCGKVHL WFNNLQ IoA 
DGRRHRILLRLSNLQRGAGSLELYLDCIQVDSVHNLPRAr AGPSQKPETIELRTFQRKPQ 
DFLEELKLVVRGSLFQVASLQDCFLQQSEPIiAATGTGDFNRQFLGQMTQLNQLLGEVKDL 
LRQQVKETS FLRNT I AECQACGPLKFQS PTP STWAP AP PAP PTRP PRRCDS NP CFRG VQ 
CTDSRDGFQCGPCPEGYTGNGITCIDVDECKYHPCYPGVHCINLSPGFRCDACPVGFTGP 
MVQGVGISFAKSNKQVCTIDIDECRNGACVPNSICVNTLGSYRCGPCKPGYTGDQIRGCiO/ 
ERNCRNPELNPCSVNAQCIEERQGDVTCVCGVGWAGDGYICGKDVDIDSYPDEELPCSAR 
NCKXDNCKTTVPNSGQEDADRDGIGDACDEDAD 

DACDNCLSVLNTTOQKDTDGDGRGDACDDDMDGDGIKNILDNCPKFPNPJDQRDKjDGDGVGD 
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ACDSCPDVSNPNQSDVDNDLVGDSCDTNQDSDGDGHQDSTDNCPTVINSAQLDTDKDGIG 

DECDDDDDNDGIPDLVPPGPDNCRLVPNPAQEDSNSDGVGDICESDFDQDQVIDRIDVCP 

ENAEVTLTD FRAYQTVGLD P EGDAQ ID PNWWXjNQGME'I VQTMNS D PGLAVG YTAFNGVD 

FEGTFHVNTQTDDD YAGF I FGYQDS S S F YWMWKQTEQTYWQ ATP FRAVAEPG I QL KAVK 

SKTGPGEHLRNSLWHTGDTSDQVKLLWKIDSRNVGWKDKVSYRWFLQHRPQVGY 

GSELVADSGVTIDTTMHGGRLGVFCFSQENIIWSNIjKYRC^TIPEDFQEFQ 

N 



Figure 2C 
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